Topic Exploration and Distillation for Web Search by a Generalized Similarity Analysis

نویسندگان

  • Xiaoyu Wang
  • Hongwei Wu
  • Li Wei
  • Aoying Zhou
چکیده

Topic distillation is the process of finding representative pages relevant to a given query. Well-known topic distillation approaches such as the HITS algorithm have shown to be useful in identifying high quality pages of the most popular topic within a query specific graph of hyperlinked documents. Many succeeding researchers focus on augmenting HITS with further content analysis to alleviate the steady deterioration of distillation quality suffered by HITS. In this paper, we attempt to revisit the behavior of HITS from a different point of view. Namely, a similarity-based analysis model is applied to observing the distillation procedure. By defining a generalized similarity, an algorithm is proposed, which can improve the quality of distillation only using the information of hyperlinks. A topic exploration function is also integrated in the algorithm framework, which enables end-users to search less popular topics when multi-topics are involved in queries. The experimental results reveal two benefits from the new algorithm: the improvement of distillation quality without utilizing any content information of pages, and an additional ability to explore the topics emerging in the query results.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topic Distillation with Knowledge Agents

This is the second year that our group participates in TREC’s Web track. Our experiments focused on the Topic distillation task. Our main goal was to experiment with the Knowledge Agent (KA) technology [1], previously developed at our Lab, for this particular task. The knowledge agent approach was designed to enhance Web search results by utilizing domain knowledge. We first describe the generi...

متن کامل

Subsite Retrieval: A Novel Concept for Topic Distillation

Topic distillation is one of the main information needs when users search the Web. In previous approaches to topic distillation, the single page was treated as the basic searching unit. This strategy is inherited from general information retrieval, which has not fully utilized the structure information of the Web. In this paper, we propose a novel concept for topic distillation, named subsite r...

متن کامل

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Searching the hypermedia Web: improved topic distillation through network analytic relevance ranking

The Web is a large hypermedia space that is generally explored using search engines. These search engines are evolving to make more effective use of the hypermedia structure of the Web. This paper contributes to this evolution by proposing new methods of topic distillation in structured search based on co-citation and network analysis. We describe a set of 21 network analysis measures of releva...

متن کامل

UIC at TREC - 2002 : Web Track ( Draft )

This is the first year that members of the Database and Information System Lab (DBIS) at University of Illinois at Chicago (UIC) participate in TREC. We participate in two tasks for the Web track: topic distillation and named page finding. Linkage information among documents as well as content information about documents is used in some of our submitted runs. We utilize the Okapi weighting sche...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002